Efficient Large-Scale Multi-Modal Classification

نویسندگان

Douwe Kiela

Edouard Grave

Armand Joulin

Tomas Mikolov

چکیده

While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantities of data quickly. We investigate various methods for performing multi-modal fusion and analyze their trade-offs in terms of classification accuracy and computational efficiency. Our findings indicate that the inclusion of continuous information improves performance over text-only on a range of multi-modal classification tasks, even with simple fusion methods. In addition, we experiment with discretizing the continuous features in order to speed up and simplify the fusion process even further. Our results show that fusion with discretized features outperforms text-only classification, at a fraction of the computational cost of full multimodal fusion, with the additional benefit of improved interpretability. Text classification is one of the core problems in machine learning and natural language processing (Borko and Bernick 1963; Sebastiani 2002). It plays a crucial role in important tasks ranging from document retrieval and categorization to sentiment and topic classification (Deerwester et al. 1990; Joachims 1998; Pang and Lee 2008). However, while the incipient Web was largely text-based, the recent decade has seen a surge in multi-modal content: billions of images and videos are posted and shared online every single day. That is, text is either replaced as the dominant modality, as is the case with Instagram posts or YouTube videos, or it is augmented with non-textual content, as with most of today’s web pages. This makes multi-modal classification an important problem. Here, we examine the task of multi-modal classification using neural networks. We are primarily interested in two questions: what is the best way to combine (i.e., fuse) data from different modalities, and how can we do so in the most efficient manner? We examine various efficient multi-modal fusion methods and investigate ways to speed up the fusion process. In particular, we explore discretizing the continuous features, which leads to much faster training and requires Copyright c © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. less storage, yet is still able to benefit from the inclusion of multi-modal information. To the best of our knowledge, this work constitutes the first attempt to examine the accuracy/speed trade-off in multi-modal classification; and the first to directly show the value of discretized features in this particular task. If current trends continue, the Web will become increasingly multi-modal, making the question of multi-modal classification ever more pertinent. At the same time, as the Web keeps growing, we have to be able to efficiently handle ever larger quantities of data, making it important to focus on machine learning methods that can be applied to large-scale scenarios. This work aims to examine these two questions together. Our contributions are as follows. First, we compare various multi-modal fusion methods, examine their trade-offs, and show that simpler models are often desirable. Second, we experiment with discretizing continuous features in order to speed up and simplify the fusion process even further. Third, we examine learned representations for discretized features and show that they yield interpretability as a beneficial side effect. The work reported here constitutes a solid and scalable baseline for other approaches to follow; our investigation of discretized features shows how multi-modal classification does not necessarily imply a large performance penalty and is feasible in large-scale scenarios.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust multi-site MR data processing: iterative optimization of bias correction, tissue classification, and registration

A robust multi-modal tool, for automated registration, bias correction, and tissue classification, has been implemented for large-scale heterogeneous multi-site longitudinal MR data analysis. This work focused on improving the an iterative optimization framework between bias-correction, registration, and tissue classification inspired from previous work. The primary contributions are robustness...

متن کامل

Robust Multi-modal and Multi-unit Feature Level Fusion of Face and Iris Biometrics

Multi-biometrics has recently emerged as a mean of more robust and efficient personal verification and identification. Exploiting information from multiple sources at various levels i.e., feature, score, rank or decision, the false acceptance and rejection rates can be considerably reduced. Among all, feature level fusion is relatively an understudied problem. This paper addresses the feature l...

متن کامل

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Large-scale image annotation is a challenging task in image content analysis, which aims to annotate each image of a very large dataset with multiple class labels. In this paper, we focus on two main issues in large-scale image annotation: 1) how to learn stronger features for multifarious images; 2) how to annotate an image with an automatically-determined number of class labels. To address th...

متن کامل

Damage detection of structures using modal strain energy with Guyan reduction method

The subject of structural health monitoring and damage identification of structures at the earliest possible stage has been a noteworthy topic for researchers in the last years. Modal strain energy (MSE) based index is one of the efficient methods which are commonly used for detecting damage in structures. It is also more effective and economical to employ some methods for reducing the degrees ...

متن کامل

A TWO-STAGE DAMAGE DETECTION METHOD FOR LARGE-SCALE STRUCTURES BY KINETIC AND MODAL STRAIN ENERGIES USING HEURISTIC PARTICLE SWARM OPTIMIZATION

In this study, an approach for damage detection of large-scale structures is developed by employing kinetic and modal strain energies and also Heuristic Particle Swarm Optimization (HPSO) algorithm. Kinetic strain energy is employed to determine the location of structural damages. After determining the suspected damage locations, the severity of damages is obtained based on variations of modal ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1802.02892 شماره

صفحات -

تاریخ انتشار 2017

Efficient Large-Scale Multi-Modal Classification

نویسندگان

چکیده

منابع مشابه

Robust multi-site MR data processing: iterative optimization of bias correction, tissue classification, and registration

Robust Multi-modal and Multi-unit Feature Level Fusion of Face and Iris Biometrics

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

Damage detection of structures using modal strain energy with Guyan reduction method

A TWO-STAGE DAMAGE DETECTION METHOD FOR LARGE-SCALE STRUCTURES BY KINETIC AND MODAL STRAIN ENERGIES USING HEURISTIC PARTICLE SWARM OPTIMIZATION

عنوان ژورنال:

اشتراک گذاری